Skip to content

Conversation

@fedexman
Copy link

Fix media type inference for URLs with query parameters

When using presigned URLs (eg AWS S3) with ImageUrl, AudioUrl, or VideoUrl, the media type inference fails

https://pics.s3.ap-northeast-1.amazonaws.com/test/Capture-2025-11-21-112402.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=fdafdas%2F20251121%2Fap-northeast-1%2Fs3%2Faws4_request&X-Amz-Date=20251121T023200Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEDoaD

to the agent we get this error,

Internal server error: Could not infer media type from image URL: 
...
Explicitly provide a `media_type` instead

the reason is that the _infer_media_type function only check the end of the url with url.endswith('.mkv') but do not parse the url.

I propose to parse the url with

from urllib.parse import urlparse
path = urlparse(self.url).path
if path.endswith('.mkv'):
            return 'video/x-matroska'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we using the mimetypes stdlib module? mimetypes.guess_type() already parses URLs and the current implementation doesn't take into account case insensitivity, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Viicos Interestingly we already use that in DocumentUrl._infer_media_type, after checking a bunch of types ourselves :/

@fedexman Can you see if we can use mimetypes.guess_type() for all of these?

The method can be changed to just return str rather than XMediaType, as I don't think that type is used on any public fields.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Viicos Interestingly we already use that in DocumentUrl._infer_media_type, after checking a bunch of types ourselves :/

@fedexman Can you see if we can use mimetypes.guess_type() for all of these?

The method can be changed to just return str rather than XMediaType, as I don't think that type is used on any public fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants